Explanation in Computational Stylometry
نویسنده
چکیده
Computational stylometry, as in authorship attribution or profiling, has a large potential for applications in diverse areas: literary science, forensics, language psychology, sociolinguistics, even medical diagnosis. Yet, many of the basic research questions of this field are not studied systematically or even at all. In this paper we will go into these problems, and suggest that a reinterpretation of current and historical methods in the framework and methodology of machine learning of natural language processing would be helpful. We also argue for more attention in research for explanation in computational stylometry as opposed to purely quantitative evaluation measures and propose a strategy for data collection and analysis for achieving progress in computational stylometry. We also introduce a fairly new application of computational stylometry in internet security. 1 Meta-knowledge Extraction from Text The form of a text is determined by many factors. Content plays a role (the topic of a text determines in part its vocabulary), text type (genre, register) is important and will determine part of the writing style, but also psychological and sociological aspects of the author of the text will be sources of stylistic language variation. These psychological factors include personality, mental health, and being a native speaker or not; sociological factors include age, gender, education level, and region of language acquisition. Writing style is a combination of consistent decisions in language production at different linguistic levels (lexical choice, syntactic structures, discourse coherence, ...) that is linked to specific authors or author groups such as male authors or teenage authors. It remains to be seen whether this link is consistent over time and whether there are style features that are unconscious and cannot be controlled, as some researchers have argued. The basic research question for computational stylometry seems then to describe and explain the causal relations between psychological and sociological properties of authors on the one hand, and their writing style on the other. These theories can be used to develop systems that generate text in a particular style, or perhaps more usefully, systems that detect the identity of authors (authorship attribution and verification) or some of their psychological or sociological properties (profiling) from text. A limit hypothesis arising from this definition is that style is unique for an individual, like her fingerprint, earprint or genome. This has been called the human stylome hypothesis:
منابع مشابه
Stylometry with R: A Package for Computational Text Analysis
This software paper describes ‘Stylometry with R’ (stylo), a flexible R package for the highlevel analysis of writing style in stylometry. Stylometry (computational stylistics) is concerned with the quantitative study of writing style, e.g. authorship verification, an application which has considerable potential in forensic contexts, as well as historical research. In this paper we introduce th...
متن کاملFunction Words in Authorship Attribution. From Black Magic to Theory?
This position paper focuses on the use of function words in computational authorship attribution. Although recently there have been multiple successful applications of authorship attribution, the field is not particularly good at the explication of methods and theoretical issues, which might eventually compromise the acceptance of new research results in the traditional humanities community. I ...
متن کاملA Hybrid Statistical-Linguistic Model of Style Shifting in Literary Translation
1 . The present paper presents an original interdisciplinary study of style-shifting in literary translation, which draws upon methodologies and techniques from corpus stylistics and computational stylometry, and relevant sociolinguistic theories of style variation. Such an innovative approach to the literary translator’s idiosyncratic use of language sets out to address one of the most difficu...
متن کاملDetecting Style in Ancient Latin
Background Stylometry is a field that uses statistical and computational techniques to study the style of authors. Stylometry is used to address questions authenticity, authorship, and chronologies, among other questions. Most famously, Mosteller and Wallace used statistical techniques to determine the disputed authorship of the Federalist papers.1 More recently in 2015, stylometric techniques ...
متن کاملAuthor Identification: Using Text Mining, Feature Engineering & Network Embedding
Authorship analysis is a challenging area that has been developed through centuries and with research done widely scattered across multiple disciples of mainly computational linguistics, text mining, data mining, stylometry and machine learning. Conventional techniques from the past relied heavily on stylometry and text-based content analysis of document text for authorship analysis. More recen...
متن کامل